A Tight Lower Bound Instance for k-means++ in Constant Dimension

نویسندگان

  • Anup Bhattacharya
  • Ragesh Jaiswal
  • Nir Ailon
چکیده

The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows: Pick the first center randomly from the given points. For i > 1, pick a point to be the i center with probability proportional to the square of the Euclidean distance of this point to the closest previously (i− 1) chosen centers. The k-means++ seeding algorithm is not only simple and fast but also gives an O(log k) approximation in expectation as shown by Arthur and Vassilvitskii [7]. There are datasets [7, 3] on which this seeding algorithm gives an approximation factor of Ω(log k) in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably high probability (say 1/poly(k)). Brunsch and Röglin [9] gave a dataset where the k-means++ seeding algorithm achieves an O(log k) approximation ratio with probability that is exponentially small in k. However, this and all other known lower-bound examples [7, 3] are high dimensional. So, an open problem was to understand the behavior of the algorithm on low dimensional datasets. In this work, we give a simple two dimensional dataset on which the seeding algorithm achieves an O(log k) approximation ratio with probability exponentially small in k. This solves open problems posed by Mahajan et al. [13] and by Brunsch and Röglin [9].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A bound for Feichtinger conjecture

In this paper‎, ‎using the discrete Fourier transform in the finite-dimensional Hilbert space C^n‎, ‎a class of nonRieszable equal norm tight frames is introduced ‎and‎ using this class, a bound for Fiechtinger Conjecture is presented. By the Fiechtinger Conjecture that has been proved recently, for given A,C>0 there exists a universal constant delta>0 independent of $n$ such that every C-equal...

متن کامل

Sample Complexity of Testing the Manifold Hypothesis

The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimizatio...

متن کامل

The mixing time for simple exclusion

We obtain a tight bound of O(L log k) for the mixing time of the exclusion process in Z/LZ with k ≤ 1 2 L particles. Previously the best bound, based on the log Sobolev constant determined by Yau, was not tight for small k. When dependence on the dimension d is considered, our bounds are an improvement for all k. We also get bounds for the relaxation time that are lower-order in d than previous...

متن کامل

A Bad Instance for k-Means++

k-means++ is a seeding technique for the k-means method with an expected approximation ratio of O(log k), where k denotes the number of clusters. Examples are known on which the expected approximation ratio of k-means++ is Ω(log k), showing that the upper bound is asymptotically tight. However, it remained open whether k-means++ yields an O(1)-approximation with probability 1/poly(k) or even wi...

متن کامل

A Tight Lower Bound for High Frequency Moment Estimation with Small Error

We show an Ω((n1−2/p logM)/ ) bits of space lower bound for (1 + )-approximating the p-th frequency moment Fp = ‖x‖p = ∑n i=1 |xi| of a vector x ∈ {−M,−M+1, . . . ,M} with constant probability in the turnstile model for data streams, for any p > 2 and ≥ 1/n (we require ≥ 1/n since there is a trivial O(n logM) upper bound). This lower bound matches the space complexity of an upper bound of Gangu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014